AITopics | adversarial noise

Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints. It enjoys practical success but is poorly understood theoretically. This paper proposes an algorithm that alternates between decoding the weights and updating the features, and shows that assuming a generative model of the data, it provably recovers the groundtruth under fairly mild conditions. In particular, its only essential requirement on features is linear independence. Furthermore, the algorithm uses ReLU to exploit the non-negativity for decoding the weights, and thus can tolerate adversarial noise that can potentially be as large as the signal, and can tolerate unbiased noise much larger than the signal. The analysis relies on a carefully designed coupling between two potential functions, which we believe is of independent interest.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

ca24eb48806df3af49e5ac59d8a46f67-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 18:35:12 GMT

artificial intelligence, machine learning, noise, (19 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance

Neural Information Processing SystemsApr-29-2026, 18:35:08 GMT

In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in [17]. This optimality is defined in terms of iteration and oracle complexity, as well as the maximal admissible level of adversarial noise. However, the assumption of finite variance is burdensome and it might not hold in many practical scenarios. To address this, we demonstrate how to adapt a refined clipped version of the accelerated gradient (Stochastic Similar Triangles) method from [35] for a two-point zero-order oracle. This adaptation entails extending the batching technique to accommodate infinite variance -- a non-trivial task that stands as a distinct contribution of this paper.

artificial intelligence, machine learning, noise, (19 more...)

Neural Information Processing Systems

Country: Europe > Russia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Appendix

Neural Information Processing SystemsApr-27-2026, 21:48:07 GMT

Our results heavily rely on the specific nature of the periodic activation function, so a natural question is to which extent our results can be extended beyond the single periodic neuron class. For lower bounds, a challenging but very interesting generalization would be to establish the cryptographic-hardness of learning certain family of GLMs whose activation function does not need to be periodic. A potentially easier route forward on this direction, would be to consider the Hermite decomposition of the activation function, similar to [A3], and establish lower bounds on the performance of low-degree methods [A23], of SGD [A3], or of local search methods methods [A15], for activation functions whose low-degree Hermite coefficients are exponentially small. For upper bounds, we believe that our proposed LLL-based algorithm may be extended beyond learning even periodic activation functions, such as the cosine activation, by appropriately post-processing the measurements, but leave this for future work. Furthermore, it would be interesting to better understand (empirically or analytically) the noise tolerance of our LLL-based algorithm for "low-frequency" activation functions, such as the absolute value underlying the phase retrieval problem which has "zero" frequency.

artificial intelligence, exp, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback